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Abstract 

We propose a new method for dimension reduction in regression using the 
first two inverse moments. We develop corresponding weighted chi-squared 
tests for the dimension of the regression. The proposed method considers 
linear combinations of Sliced Inverse Regression (SIR) and the method using 
a new candidate matrix which is designed to recover the entire inverse second 
moment subspace. The optimal combination may be selected based on the p- 
values derived from the dimension tests. Theoretically, the proposed method, 
as well as Sliced Average Variance Estimate (SAVE), are more capable of 
recovering the complete central dimension reduction subspace than SIR and 
Principle Hessian Directions (pHd). Therefore it can substitute for SIR, pHd, 
SAVE, or any linear combination of them at a theoretical level. Simulation 
study indicates that the proposed method may have consistently greater 
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power than SIR, pHd, and SAVE. 
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1. Introduction 



The purpose of the regression of a univariate response y on a p-dimensional 
predictor vector x is to make inference on the conditional distribution of y|x. 



Following 



Cooki (Il998bl ). x can be replaced by its standardized version 



I-V2/ 



(1) 



where /z x and S x denote the mean and covariance matrix of x respectively 
assuming non-singularity of S x . 

The goal of dimension reduction in regression is to find out a.pxd matrix 
7 such that 



y _1L z\j'z 



(2) 



where " JL " indicates independence. Then the p-dimensional z can be re- 
placed by the d- dimensional vector 7'z without specifying any parametric 
model and without losing any information on predicting y. The column space 
Span{7} is called a dimension reduction subspace. The smallest applicable d 
is called the dimension of the regression. 



Based on the inverse mean E(z|^),|LJ (11991al ) proposed Sliced Inverse Re- 



gression (SIR) for dimension reduction in re gre s sion. 



can n ot recover the symmetric dependency (ILil . 



1991b 



t is realized that SIR 



Cook and Weisbergi . 



199ll ). After SIR, many dimension reduction methods have been introduced. 
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Sliced Average Variance Estimate (SAVE) proposed by ICook and Weisberg 



Ljj (11992 1 are 



(119911 ) and Principle Hessian Directions (pHd) proposed by 
another two popular ones. Both pHd and SAVE refer to the second in- 
verse moment, centered or non-centered. Compared with S AVE, pHd can 



not detect certain depend ency hidden in the second mo ment (Yin and Coo 



2002 



Ye and Weiss. 



20031 ) and the linear dependency (ILi . 



1992 



Cook 



1998a). 



Among those dimension reduction methods using only the first two inverse 
moments, SAVE seems to be the preferred one. Never theless, SAVE is not 



Ye and Weissl (120031 ) implied that a lin- 



always the winner. For example, 
ear combination of SIR and pHd may pe rform better than SAVE in some 



cases. It is not surprising since |Lj (Il991bl ) already suggested that a suitable 
combination of two different me thods might sharpen the dimension reduc- 
tion results. lYe and Weissl (120031 ) further proposed that a bootstrap method 



could be used to pick up the "best" linear combination of two known meth- 
ods, as well as the dimension of the regression, in the sense of the variability 
of the estimators, although lower variability u nder the boots t rap p rocedure 



Li and Wand (120071 ) pointed 



does not necessarily lead to a better estimator, 
out that linear combinations of two known methods selected by the bootstrap 
criterion may not perform as well as a single new method, their Directional 
Regression method (DR), even though the bootstrap one is computationally 
intensive. 

This article aims to develop a new class of, instead of a single one, di- 
mension reduction methods using only the first two inverse moments, as well 
as the corresponding large sample tests for the dimension of the regression 
and an efficient criterion for selecting a suitable candidate from the class. 
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Theoretically, it can cover SIR, pHd, SAVE and their linear combinations. 
Practically, it can achieve higher power in recovering the dimension reduction 
subspace. In Section 2, we review the necessary dimension reduction context. 
In Section 3, we introduce a simple candidate matrix M zz i\ y which targets the 
entire inverse second moment subspace. It is indeed the candidate matrix of 
an intermediate method between pHd and SAVE. In Section 4, we propose 
a new class of dimension reduction methods called Sliced Inverse Moment 
Regression (SIMR), along with weighted chi-squared tests for the dimension 
of the regression. In Section 5, we use SIMR to analyze a simulated exam- 
ple and illustrate how to select a good candidate of SIMR. Simulation study 
shows that SIMR may have consistently greater power than SIR, pHd, and 
SAVE, as wel 



( jCook and Ni 



as PR and another new method Inverse Regression Estimator 
20051 ) . In Section [6j a real example is used to illustrate how 
the proposed method works. It is implied that a class of dimension reduc- 
tion methods, along with a suitable criterion for choosing a good one among 
them, may be preferable in practice to any single method. We conclude this 
article with discussion and proofs of the results presented. 



2. Dimension Reduction Context 

2.1. Central Dimension Reduction Subspace (CDRS) 



Cook! (jl994bl . Il996l ) introduced the notion of central dimension reduction 
subspace (CDRS), denoted by S y \ z , which is the intersection of all dimension 
reduction subspaces. Under fairly weak restrictions, the CDRS S y \ z is still a 
dimension reduction subspace. 
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In this article, we always is a dimension reduction sub- 

space and that the columns of 7 is an orthonormal basis of S y \ z . In practice, 
we usually first transform the original data {xj} into their standardized ver- 
sion {zj} by replacing S x and /i x in flTJ with their usual sample estimates X x 
and /t x . Then we can estimate S y \ x by 

where S^z is an estimate of S^. Therefore, the goal of dimension reduction 
in regression is to find out the dimension of the regression d and the CDRS 
S y \ x = Span{7L 



Following iLil (I1991al ) and ICookl (jl998bl ). we also assume: (1) E(z\j'z 



P 7 z, where P 7 = 77', known as the linearity condition; (2) Var(z|7'z) = Q y , 
where Q 1 = I — P 7 , known as the constant covariance condition. These two 
conditions hold if z is normally distributed, although the normality is not 
necessary. 

2.2. Candidate Matrix 



Ye and Weissl ( 120031 ) introduced the concept of candidate matrix, which 
is a p x p matrix A satisfying A = P y AP T They showed that any eigenvector 
corresponding to any nonzero eigenvalue of A belongs to the CDRS Span{7}. 
Besides, the set of all candidate matrices, denoted by Ai, is closed under 
scalar multiplication, transpose, addition, multiplication, and thus under 
linear combination and expectation. 

They also showed that the matrices \pi{y) p>i{y) f ] and [^2(2/ ) — I] belong 
to M. for all y, where fi\(y) = E(z|y) and ^{y) = E(zz'|y). They proved 
that the symmetric matrices that SIR, SAVE, and y-pHd estimate all belong 
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to M: 



Msm = Var(E(z|y)) = E^fo)^)'] , 
Msave = E[(/-Var(z|y)) 2 ] 

= E(|>ii(y)^i(y)f + |>i 2 (y)-Il 2 

-ImMMvWMv) ~ : 1 - Ml/) - I][/ii(y)/ii(y)']) , 
M,_ pHd = E[(y - E(y))zz'] = E[y(^ 2 (y) - I)] . 

3. Candidate Matrix M zz ri y 

3.1. A Simple Candidate Matrix 

The matrices [^\{y) Hi{y)'] and [^(y) — I] are actually two fundamental 
components of Msir, Msave, and Mj^pHd (see Section [2T2]) . Msir only in- 
volves the first component [p,i(y)jii(y)'], while both Msave and M y _ piid share 
the second component [/^(y) — I]- Realizing that this common feature may 
lead to the connection between SAVE and pHd, we investigate the behavior 
of the matrix [/i 2 (y) — I]- To avoid the inconvenience due to E([/i 2 (y) — I]) = 0, 
we define 

M zz% = E([E(zz'-I|y)] 2 ) = E([My)-I] 2 ). 



Note t hat M zz i\ y takes a simpler form than the rescaled version of sirll (ILil . 



1991bl . Remark R.3) while still keeping the theoretical comprehensiveness. It 



also appea rs as a compo n ent in one expression of the directional regression 



matrix G (ILi and Wang 



20071 . eq.(4)). We choose its form as simple as 
possible for less complicated large sample test and potentially greater test 
power. To establish the relationship between M y _ p ^ d and M zz n y , we need: 
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Lemma 1. Let M be a p x q random matrix defined on a probability space 
(flyJ 7 , P), then there exists an event Sl G T with probability 1, such that, 

Span{£(MM')} = Span{M(c;), u G fi }. 



A similar result can also be found in lYin and Cookl (120031 . Proposition 2(i)). 
The lemma here is more general. By the definition of M zz >\ y , 

Corollary 1. Span{M zz /|<,} = Span{[yU 2 (y) — I], y G fi,(y)}, where £l(y) is 
the support ofy. 



Based on Corollary [T], lYe and Weisd (120031 . Lemma 3), and the fact that 
\p2(y) — I] G M. for all y, matrix M zz n y is in fact a candidate matrix too. 
Corollary [T] also implies a strong connection between M^.pHd and M zz i\ y : 



Corollary 2. Span{Mj / _ pH d} ^ Sp&n{M zz >\ y } . 



To further understand the relationship between M % 



the central k-th moment dimension reduction subspace ( lYin and Cookl . 120031 ). 



pHH and M 7 , 7 / | ,,, reca ll 



Syl = Span{r/ fc )}. The corresponding random vector (r]^)'z contains all the 
available information about y from the first k conditional moments of y\z. 
In other words, y JL {E(y\z), . . . , E(y fc |z)}| (r^^^'z. Similar to 

Span{E(yz), . . . , E(y k z)} = Span{E(^ 1 (y)), . . . , E(?/Vi (</))} Q ^ C S v \ a , 



the subspace Sp an-fEfj/^f?/) 



S ( f } . Parallel to 



1), . . . , E(y k [fi 2 (y) — I])} is also contained in 



Yin and Cookl (120021 . Proposition 4), the result on M zz i\ y is: 
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Proposition 1. (a) If y has finite support Q(y) = {a Q , . . . , a^}, then 
Span{M zz /| y } = Span{E[?/ 4 (/! 2 (y) - I)], i = 1, . . . , k}. 

(b) If y is continuous and ^(y) is continuous on y's support £l(y), then 
Span{M zzl2/ } = Span{E[yV 2 (</) - I)], i = 1, 2, . . .}. 



According to Proposition [T] and lYin and Cookl (120021 . Proposition 4), the 



relationship between E[y(n 2 (y) — T)] = M y _ pH d and M zz i\ y is fairly comparable 
with the relationship between E(y/j l i(y)) = E(yz) and Msir. Both E(yz) and 



My_ p ud ac tually target t 



subspace ( ICook and Lil . 



re cen tral mean (first moment) dimension reduction 



2002), while Msir and M xx i\ y target the central k- 
th moment dimension reduction subspace given any k, or equivalently the 
CDRS S y \ z as k goes to infinite. In order to understand the similarity from 



anot 



ler perspective, recall the inverse mean subspace of S y \ z (jYin and Cookl . 



2002|): 



SeO%) = Span{E(z|y), y G £%)}• 
Similarly, we define the inverse second moment subspace of S y \ z : 

Span{E(zz / | 2/ )-I,yGfi(y)}. 

By definition, matrices Msir and M zz >\ y are designed to recover the entire 
inverse mean subspace and the entire inverse second moment subspace re- 
spectively, while E(yz) and M y „ P Hd are only able to recover portions of those 
subspaces. We are therefore interested in combining matrices Msir and M zz n y 
because they are both comprehensive. 
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3.2. SAVE versus SIR and pHd 



Ye and Weisd (120031 ) showed that 

Span{M S i R } C Span{M SAV E}, (3) 
We then prove further the following proposition: 
Proposition 2. SpanjMsAVE} = SpanjMsiR,} + Sp&n{M zz i\ y }. 
A straightforward result following Proposition [2] and Corollary [2] is: 
Corollary 3. Span{My_ pH d}, Span{M S i R }, Span{M zz / |2/ } C Span{M S AVE}- 

Corollary [3] explains why SAVE is able to provide better estimates of the 
CDRS than SIR and y-pHd in many cases. 

4. Sliced Inverse Moment Regression Using Weighted Chi-Squared 
Tests 

4-1- Sliced Inverse Moment Regression 

In order to simplify the candidate matrices using the first two inverse 
moments and still keep the comprehensiveness of SAVE, a natural idea is to 
combine M zz /i y with Msir as follows: 



2x 



aMgiR + (1 - a)M zz , ly = E(a\pn(y) ^(y)'] + (1 - a)[^{y) - I] 
where a G (0, 1). We call this matrix Mg^ IR and the corresponding dimen- 



sion reduction method Sliced Inverse Moment Regression (SIMR or S 



Note that the combination h ere is simpler than the SIR Q method (ILi 



MR J . 



1991b 



Gannoun and Saracco 



20031 ) while retaining the least requirement on com- 
prehensiveness. Actually, for any a G (0,1), SIMR a is as comprehensive as 
SAVE at a theoretical level based on the following proposition: 



Proposition 3. Span{Mg^ R } = Span{M S AVE}, Va G (0, 1). 

Combined with Corollary [3j we know that any linear combination of SIR, 
pHd and SAVE can be covered by SIMR Q : 

Corollary 4. Span{aMsiR + &M y _ P Hd + cM S ave} Q SpanjMg"^}, where 
a, b, and c are arbitrary real numbers. 

Note that the way of constructing SIMR a makes it easier to develop a corre- 
sponding large sample test for the dimension of the regression (Section 4.3). 

From now on, we assume that the data {{yi,^i)}i=i,..., n are i.i.d. from a 
population which has finite first four moments and conditional moments. 

4.2. Algorithm for SIMR a 

Given i.i.d. sample (yi, xi),...,(?/ n , x n ), first standardize x; into Zj, sort the 
data by y, and divide the data into H slices with intraslice sample sizes n^, 
h = 1, . . . , H . Secondly construct the intraslice sample means {zz') h and z^: 





where z^'s are predictors falling into slice h. Thirdly calculate 



H 



SIMR 



(a) 



J2 h ((1 - «)[(zz'), - y " y + aftJftJ') 



h=l 




where fh — nh/n and 



U n = (..., y/l-a [(zz\ - l p )J f h , . . . , ...,^/a z h J f h 




) 



px(pH+H) 
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Finally calculate the eigenvalues Ai > ■ ■ ■ > A p of Msimr ano - the corre- 
sponding eigenvectors 71, . . . ,j p . Then Span{7i, . . . , 7^} is an estimate of 
the CDRS Span{7}, where d is determined by the weighted chi-squared test 
described in the next section. 

4.3. A Weighted Chi- Squared Test for SIMR a 
Define the population version of U n : 

B 

= (...,VT^ [E(zz'|y = /i)-y v ^,..., v ^E(z|y = /i) v ^:, 

Ddxd \ / (T' 21 )dx(pH+H) 



((rii)pxd, (ri2) P x(p-<o) 



(4) 



i] / \ (r 22 ) (pH+H-d)x(pH+H) 

where y is a slice indicator with y = h for all observations falling into slice h, 
fh = P(y = h) is the population version of fh, and (jlj) is the singular value 
decomposition of B. 

Denote U n = \fn{U n — B). By the multivariate central limit theorem 
and the multivariate version of Slutsky's theorem, U n converges in distri- 
buti on to a certain random p x (pH + H) matrix U as n goes to infin- 
ity fjGannoun and Saraccol . 120031 ). Note that the singular values are in- 
va riant under right and le ft multiplication by orthogonal matrices. Based 



on 



Eaton and Tylerl (11994 . Theorem 4.1 and 4.2), the asymptotic distribu- 



tion of the smallest (p — d) singular values of \/nU n is the same as the 
asymptotic distribution of the corresponding singular values of the following 
(p — d) x (pH + H — d) matrix: 



y/nT' 12 U n r 22 - 



(5) 
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Construct statistic 

p 

k d = n Ah, 

h=d+l 

which is the sum of the squared smallest (p — d) singular values of ^/nU n . 
Then the asymptotic distribution of is the same as that of the sum of the 
squared singular values of (jSJ). That is 

nTrace([r , 12 f> n r 22 ][r / 12 f> n r 22 ] / ) =n[Vec(r , 12 f> n r 22 )] / [Vec(r , 12 ?7 n r 22 )], 

where Vec(v4 rxc ) denotes (a/, . . . , a c ')' rcxl f° r an y matrix A = (ai, . . . , a c ). 
By central limit theorem and Slutsky's theorem again, 

^H+pH) 

for some nonrandom (p 2 if + pif ) x (p 2 if + pH) matrix V. Thus, 
V^[Vec(r' 12 f) n r 22 )] -4 iV (p 

-d)(pH+H-d){0, W), 

where W = [n, 2 ®r i2 ]V [1^®^]' is a ( P -d)(pH+H-d)x(p-d)(pH+H-d) 
matrix. Combined with Slutsky's theorem, it yields the following theorem: 

Theorem 1. The asymptotic distribution of A^ is the same as that of 

(p-d)(pH+H-d) 

1=1 

where the Ki 's are independent \\ random variables, and ai 's are the eigen- 
values of the matrix W. 

Clearly, a consistent estimate of W is needed for testing the dimension of 
the regression based on Theorem [TJ The way we define Mg^ R allows us to 
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partition U n into 

UnA : 



(...,vT^[(«zO fc -y >/&>•••>) 



pxpH 



u, 



11,2 



Va z h \/ /h,... 



pxH 



pxpH 



pxH 



The asymptotic distri bution of the matrix [/ n .2 has been fully explored by 
Bura and Cookl (120011 ). resulting in a weighted chi-squared test for SIR. The 
similar techniques can also be applied on the matrix U nt i, and therefore the 
matrix U n as a whole, although the details are much more complicated. 
Define the population versions of U n> i and Z7 nj2 , 

B l = (. . . , v 7 !^ [E(zz'|y = /i) - I p ] JJ h 

E 2 = (. . . , y/a E(z\y = h)s/Yh, ■ ■ ■ 

Then U n = (u nA , U n ^, and B = (B x , B 2 ). 

Let /, / and 1# be H x 1 vectors with elements //, and 1 respectively; 
let G and G be H x H diagonal matrices with diagonal entries yfjh an d \ffh 
respectively; and let 

F = {\ H -fl' H ) ) F = (l H -fl' H ), 
( r 2i) \ [ (r' 2 n)dxpH (F' 212 ) dxH 

(T22) J \ ( r 22l) ( P H+H-d)xpH (■^222) (pH+H—d) xH 

Finally, define four matrices 

M = (...,E(x\y = h),...) pxH , 

N = (...,E(x'\y = h),...) lxpH = Vec(MY, 

O = (...,E(xx!\y = h),...) pXp H, 

C = [O - M(I H ® // x ) - ^N] pxpH , 



13 



and their corresponding sample versions M n , N n , O n , and C n . By the central 
limit theorem, 

v^Vec([(C n , M n ) - (C, M)}) S N {p 

for a nonrandom (p 2 H + pH) x (p 2 H + pH) matrix A. As a result, 

Theorem 2. The covariance matrix in Theorem^ is 

W = (KT 22 )' ® (r' 12 E-V2) A(KT 22 ) ® (T' 12 ^ 1/2 )' f 

where 



K 



VT^(FG) ® S x 1/2 

y/a FG 



The only difficulty left now is to obtain a consistent estimate of A. By 
the central limit theorem, 

v^Vec([(O n ,M n ,/i x ) - (0,M,/i x )]) -4 N {p 2 H+pH+p) (0,A ) 

where Ao is a nonrandom (p 2 H + pi/ + p) x (p 2 ii + pif + p) matrix, with 
details shown in the Appendix. On the other hand, 

V <n M \ ( lp2H ~ lH ® Ax ® I P - W ® Ax \ 

Vec(C7 n ,M n ) = Vec(O n ,Af n ,/x x ) 

V o i pH o y 

= g([Vec(O n ,M n ,fr x )}) 
for a certain mapping g : n^ H+pH+ ^ -> ^(p 2 ^+p^) suc h that 

Vec(C,M) = #([Vec(0,M,/i x )]). 
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Thus the close form of A can be obtained by Cramer's theorem (jCramerl . 
1946|): 



A = [<7([Vec(O,M, /Ux )])]A [<7([Vec(O,M,/i x )])]', 
where the (p 2 H + pH) x (p 2 H + pH + p) derivative matrix 

W -Iff ® A*x ® I P - W ® A*x £13 



(6) 



<?[Vec(0,M,/i x )] 







(7) 



lp H 

with £ 13 = -(..., l p ® E(x'|£ = /i), . . .)' - Vec(M) ® I p . 

In summary, to compose a consistent estimate of matrix VF, one can (i) 
substitute the usual sample moments to get the sample estimate of A ; (ii) 
estimate A by substituting the usual sample estimates for E(x'|y = h), /i x 
and M in (jB]) and (J7J); (iii) obtain the usual sample estimates of T 12 and T 2 2 
from the singular value decomposition of U n ; (iv) substitute the usual sample 
estimates for F, G, E x , Ti2 and 1^22 in Theorem [2] to form an estimate of 
W. Note that both A and A do not rely on a. This fact can save a lot of 
computational time when multiple a's need to be checked. 

To approximate a linear combinati on of chi-squa r ed ra n dom variables , 



one may use the statistic pro posed by 



Sattert lwaite 



(1941) 



Wood! fll989f ). 



Satorra and Bentlerl (119941 ). or lBentler and Xid (120001 ) . In the next applica- 
tions, we will present tests based on Satterthwaite's statistic for illustration 
purpose. 

4-4 - Choosing Optimal a 

Ye and WeisJ ( 2003 ) proposed a bootstrap method to pick up the "best" 



linear combination of two known methods in terms of variability of the esti- 
mated CDRS S y \ z . The bootstrap method works reasonably well with known 
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dimension d of the regression, although less variability may occur with a 
wrong d (see Se ction [5] for an examp le). Another drawback is its computa- 



tional intensity (ILi and Wang . 



2003). 



Alternative criterion for "optimal" a is based on the weighted chi-squared 
tests developed for SIMR. When multiple tests with different a report the 
same dimension d, we simply pick up the a with the smallest p-value. Given 
that the true dimension d is detected, the last eigenvector 7^ added into 
the estimated CDRS with such an a is the most significant one among the 
candidates based on different a. In the mean time, the other eigenvectors 
71, ... , 7rf_i with selected a tend to be more significant than other candidates 
too. Based on simulation studies (Section [5]), the performance of the p- value 
criterion is comparable with the bootstrap one with known d. The advantages 
of the former include that it is compatible with the weighted chi-squared tests 
and it requires much less computation. 

When a model or an algorithm is specified for the data analysis, cross- 



validation could be used for choosing o ptimal a too 



for model selection. For example, see 



Hastie et al 



ust 



(12001 



ike how people did 
chap. 7). It will 



not be covered in this paper since we aim at model-free dimension reduction. 



5. Simulation Study 

5.1. A Simulated Example 

Let the response y = 2z\e + z\ + z 3 , where (z', e)' = (zi, z 2 , z 3 , z±, e)' are 
i.i.d sample from the iVs(0, 15) distribution. Then the true dimension of the 
regression is 3 and the true CDRS is spanned by (1, 0, 0, 0)', (0, 1, 0, 0)', and 
(0, 0, 1, 0)', that is, Z\, z 2 and z 3 . 
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Theoretically, M sm = Diag{0,0, Var(E(z 3 |y)), 0}, M y _ pUd = Diag{0, 2, 
0, 0}, and M r _ P Hd = Diag{0, 2, 0, 0} have rank one and therefore are only 
able to find a one-dimensional proper subspa ce of the CDRS. Th e linear 
combination of any two of them suggested by lYe and Weissl (120031 ) can at 



most find a two-dimensional proper subspace of the CDRS. On the contrary, 
both SAVE and SIMR are able to recover the complete CDRS at a theoretical 
level. 

5.2. A Single Simulation 

We begin with a single simulation with sample size n = 400. SIR, r-pHd, 
SAVE and SIMR are applied to the data. Number of slices H = 10 are 



2002 



2009 . 



used for SIR, SAVE, and SIMR. The R package dr flWeisberg 
version 3.0.3) is used for SIR, r-pHd, SAVE, as well as their corresponding 
marginal dimension tests. SIMR a with a = 0,0.01,0.05, 0.1 ~ 0.9 paced by 
0.1, 0.95,0.99,1 are applied. 

For this typical simulation, SIR identifies only the direction (.018, .000, 
— .999, —.035)'. It is roughly z%, the linear trend. r-pHd identifies only the 
direction (.011, .999, —.038, —.020)', which is roughly z 2 , the quadratic com- 
ponent. As expected, SAVE works bett er. It identifies zo and z\. However, 



the marginal dimension tests for SAVE (IShao et al.l . 120071 ) fail to detect the 



third predictor, z 3 . The p-value of the corresponding test is 0.331. 

Roughly speaking, SAVE with its marginal dimension test is comparable 
with SIMRq.i in this case. The comparison between SAVE and SIMR a sug- 
gests that the failure of SAVE might due to its weights combining the first 
and second inverse moments. As a increases, SIMR a with a between 0.3 and 
0.8 all succeed in detecting all the three effective predictors Z\, z 2 and z 3 . 
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The CDRS estimated by those candidate matrices are similar to each other, 
which implies that the results with different a are fairly consistent. The 
major difference among SIMR a is that the order of the detected predictors 
changes roughly from {z 2 , Zi, z 3 } to {z 3 , z 2 ? ^1} as a increases from 0.3 to 0.8. 
As expected, SIMR a is comparable with SIR if a is close to 1. 

For this particular simulation, SIMR Q with a between 0.3 and 0.8 are first 
selected. If we know the true CDRS, the optimal a is the one minimizing 
t he distance betwee n the estimated CDRS and the true CDRS. Following 
e and Weissl ( 120031 . p. 974), the three distance measures arccos(g), 1 — 
q, 1 — r behave similarly and imply the same a = 0.6 for this particular 
simulation. Since the true CDRS is unknown, bootstrap criterion and p- 
value criterion (Section 14.41) are applied instead. 

The left panel of Figured] shows the variability of bootstrapped estimated 
CDRS. Distance 1 — r is used because it is comparable across different di- 
mensions. The minimum variability is attained at d = 3 and a = 0.6, which 
happens to the optimal one based on the truth. Another 200 simulations 
reveal that about 75% "optimal" a based on bootstrap fall in 0.5 ~ 0.6. 
SIMR with a chosen by bootstrap criterion attains 1 — r = 0.0086 away from 
the true CDRS on average. Note that low variability not necessarily implies 
that the estimated CDRS is accurate. For example, SIMRi or SIR can only 
detect one direction z%. However the estimated one-dimensional CDRS is 
fairly stable under bootstrapping (see Figured]). 

The right panel in Figure [I] shows that the p- value criterion also picks up 
a = 0.6 for this single simulation (check the line d = 3, which is the highest 
one that still goes below the significance level 0.05). Based on the same 



18 



200 simulations, about 80% of the "best" a selected by p-value criterion fall 
between 0.4 and 0.7. On average, SIMR with a selected by p-values attains 
1 — r = 0.0082, which is comparable with the bootstrap ones. 



5.3. Power Analysis 

We conduct 1000 independent simulations and summarize in Table [1] the 
empirical powers and sizes of the marginal dimension tests with significance 
level 0.05 for SIR, SAVE, r-pHd, and SIMR a with a chosen by the p- value 
criterion. For illustration purpose, we omit the simulation results of y-pHd 
because there is little difference between y-pHd and r-pHd in this case. The 
empirical powers and sizes with significance level 0.01 are omitted too since 
their pattern is similar to Table [TJ 

In Table HJ the rows d < 0, d < 1, d < 2 an d d < 3 indicate different 



null hypotheses. Following 



Bura and Cookl (120011 ). the numerical entries in 



the rows d < 0, d < 1, and d < 2 are empirical estimates of the powers 
of the corresponding tests, while the entries in the row d < 3 are empirical 
estimates of the sizes of the tests. 

As expected, SIR claims d — 1 in most cases. r-pHd works a little 
better. At the significance level 0.05, r-pHd has about 30% chance to find 
out d>2 (Table [1]). At level 0.01, the chance shrinks to about 15%. Both 
SAVE and SIMR perform much better than SIR and pHd. Compared with 
SAVE, SIMR has consistently greater powers for the null hypotheses d < 0, 
d < 1 and d < 2 across different choices of sample size, number of slices and 
significant level. For example, under the null hypothesis d < 2 with sample 
size 400, the empirical powers of SIMR at level 0.05 are 0.939 under 5 slices 
and 0.943 under 10 slices, while the corresponding powers of SAVE are only 
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0.399 and 0.213 respectively (Tabled]). Those differences become even bigger 
at level 0.01. The empirical sizes of SIMR are roughly under the nominal 
size 0.05 although they tend to be larger than the others. 



For compa r ison 



( ICook and Ni 



j urpose, the methods inverse regression estimator (IRE) 



2005 



regression (DR) ( 



Wen and Coo 



Li and Wang 



2007 



Weisberg 



20091 )) and directional 



20071 ) are also applied. Roughly speaking, 



IRE performs similar to SIR in this example. Given that the truth dimension 
d = 3 is known, both DR and SIMR are among the best in terms of mean(l — 
r). For example, at n = 600, DR achieves mean(l — r) = 0.0050 with H = 5, 
0.0053 with H = 10 and 0.0059 with H = 15, while SIMR's are 0.0048, 
0.0046, and 0.0053. Nevertheless, the powers of the marginal tests for DR 
are between SAVE and SIMR in this case. Roughly speaking, DR's power 
tests are comparable with SIMR Q 's with a between 0.2 and 0.3. For example, 
at H = 10 and level 0.05, the empirical powers of DR against d < 2 are 0.247 
with n = 200, 0.800 with n = 400, and 0.974 with n = 600. 

Among the six dimension reduction methods applied, SIMR is the most 
reliable one. Besides, the chi-squared tests for SIMR do not seem to be very 
sensitive to the numbers of slices. Nevertheless, we suggest that the number 
of slices should not be greater than 3%-5% of the sample size based on the 
simulation results. 



6. A Real Example: Ozone Data 



T o examine how SIMR works i n practice, we consider a data set taken 



from 



Breiman and Friedman! (119851 ). The response Ozone is the daily ozone 



concentration in parts per million, measured in Los Angeles basin, for 330 
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days in 1976. For illustration purpose, the dependence of Ozone on the 
following four predictors is studied next: Height, Vandenburg 500 millibar 
height in meters; Humidity in percents; ITemp, Inverse base temperature 
in degrees Fahrenheit; and STemp, Sandburg Air Force Base temperature in 
degrees Fahrenheit. 

To meet both the linearity condition and the constant covariance condi- 
tion, simultaneously power transformations on the predictors are estimated to 
improve the normality of their joint distribution. After replacing Humidity, 
ITemp, and STemp with Humidity 1 ' 68 , ITemp 1 ' 25 , and STemp 1 ' 11 respectively, 
SIR, r-pHd, SAVE and SIMR are applied to the data. For SIR, SAVE, 
and SIMR, various numbers of slices are applied, and the results are fairly 
consistent. Here we only present the outputs based on H = 8. 

At significance level 0.05, SIR suggests the dimension of the regression 
d = 1, while r-pHd claims d = 2. Us i ng the visualization tools described by 



Cook and Weisbergi (119941 ) and lCook! (jl998bl ). the first pHd predictor appears 



to be somewhat symmetric about the response Ozone, and the second pHd 
predictor seems to be similar to the first SIR predictor, which are not shown 
in this article. The symmetric dependency explains why SIR is not able to 
find the first pHd predictor. The resulting inference based on pHd is therefore 
more reliable than the inference based on SIR. 

When checking the predictors of SAVE, visual tools show a clear quadratic 
or even higher order polynomial dependency between the response and the 
first SAVE predictor. The second SAVE predictor is similar to the second 
pHd predictor, and the third SAVE predictor is similar to the first pHd 
predictor. Both SIR's and pHd's tests miss the first SAVE predictor. 
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Now apply SIMR to the ozone data. Bootstrap criterion picks up a = 0.2 
while p-value criterion suggests a = 0. Nevertheless, both SIMR0.2 and 
SIMRo lead to very similar estimated CDRS in this case (see Table [2]). As 
expected , they recovers all the three SAVE predictors. Actually, those three 
estimated CDRS appear to be almost identical. 

7. Discussion 

SIMR a and SAVE are theoretically equivalent since that the subspaces 
spanned by their underlying matrices are identical. Nevertheless, simulation 
study shows that SIMR a with some chosen a may perform better than SAVE. 
The main reason is that SAVE is only a fixed combination of the first two 
inverse moments. The simulation example in Section implies that any 
fixed combination can not always be the winner. Apparently, SIMR .6 can 
not always be the winner either. For example, if the simulation example is 
changed to y — 2z\E + z\ + O.I23, SIMR a with a closer to 1 will perform 
better. For practical use, multiple methods, as well as their combinations, 
should be tried and unified. SIMRq, with a G (0, 1) provide a simple solution 
to it. 

As a conclusion, we propose SIMR using weighted chi-squared tests as an 
important class of dimension reduction methods, which should be routinely 
considered during the search for the central dimension reduction subspace 
and its dimension. 

Appendix 

Proof of Lemma [It By definition, Span{E(MM')} C Span{M(cj), cu 6 
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Qo}, if P(f^o) — 1- On the other hand, for any t> pxl 7^ 0, 

v'E(M(u)M'(u)) = v'E(M(u)M'(w))v = 
E([v'M(lo))[v'M(u))') = =>• [v'M(o;)] = 0, with probability 1 

Since {t> : d'E(MM') = 0} only has finite dimension, there exists an Qq with 
probability 1, such that, 

dim(Span{E(M(u;)M'(u;))}) > dim(Span{M(cj), u G fi })- 
Thus, Span{E(M(u;)M'(w))} = Span{M(c;), u G fi } 
Proof of Corollary [2} 

Span{M y _ P Hd = E[y(fi 2 (y) - 1)]} C Span{[// 2 (y) - I], Vy} = Span{M zz /| y }. 

Proof Proposition [Q Define ^ = E[(zz' — = a*] = E(zz'|y = a,j) — 
I and /j = Pr(y = Oj) for z = 0,...k, then Ef =0 /j = 1 and E^ =0 /j/ij = 
E((zz' — I)) = 0. Th e rest of the steps follow the exactly same proof as in 
Yin and Cookl (hood . A.3. Proposition 4). 

Proof of Proposition [2t By Lemma [T], 

Span{M SA vE} = Span{[^i(2/)/ii(j/) / + My) - I)], Vy} 

C Span{/ii(y),Vy} + Span{(// 2 (y) -I),Vj/} 
= Span{M SIR } + Span{M zz ^} 

C Span{M SIR } + [Span{/ii(y)^i(3/)' + (fi 2 (y) - I), Vy} 

+Span{^i(y),Vy}] 
C Span{M S i R } + Span{M SA vE} + Span{M S i R } 
= Span{M SAV E}- 
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Proof of Proposition By Lemma [Q, 

SpanjM^jJ = Span{(/ii(y), \p 2 (y) -I]),V|/} 

= Span{/i! (y) , Vy} + Span{ [/i 2 (y ) - I] , Vy} 

= Span{M S i R } + Span{M zz /| y } 

= Span{M SA vE}- 

Proof of Theorem |2 Actually, 5 = E X 1/2 (C, M)K, 

v^T^ (FG) <g> E x 1/2 \ 

y^FG ) 

Note that (P^-Si, r' 12 -B 2 ) = 0( p -d)x( P H+H), -£^221 + -821^222 = ® P x( P H+H-d), 
Span{C"£ x 1/2 r 12 } C Span{l^ ® I p }, Span{M'E x 1/2 r 12 } C Span{l H }, 
= 0, l'^F = 0. Writing \ p = E X 1/2 E X /2 , 

v / nr / 12 f/ n r 22 
= v / wr' 12 c/ n) ir 2 2i + v / wr' 12 c/ rij2 r 222 

= v/T^ v^r' 12 (i p - I p + I^E-V^C^ -C + C) [(FG -FG + FG) ® l p ] 

(i H ®zz 1/2 )[i H ® (i;-i p + i p )]r 221 + v^r' 12 (i p -i p + g 

E X 1 / 2 (M„ - M + M)(FG - FG + FG)T 222 

= VT^ ^' 12 zz 1/2 (c n - c)[fg ® y (i H ® £- 1/2 )r 221 

+ v / o : ^r' 12 E^ 1 / 2 (M n - M)FGT 222 + O p (n" 1 / 2 ) 
= ^^-^[(C^M^-^M)]^, + O p (n- 1/2 ). 

Therefore, the asymptotic distribution of T' l2 U n T 22 is determined only by 
the asymptotic distribution of (C n ,M n ). 
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U„ 



The detail of A , (p 2 H + pH + p) x (p 2 H + pH + p): 





/ 


A 1 ' 1 


A 1 ' 2 


A 1 ' 3 


\ 


A = 




A 2 ' 1 

^0 


A 2 ' 2 


A 2 ' 3 






V 


A 3 ' 1 


A 3 ' 2 


a 3,3 
^0 


/ 



where A*' 1 = diag {. . . , Cov (Vec(xx')|y = h) /f h , . . .} , p 2 H x p 2 H; A 2 ' 1 = 
diag{...,Cov(x,Vec(xx')|y = /i)/A,...}, P H x p 2 H; A 2 ' 2 = diag{ 
Cov(x|y = h)/f h , . . .}, pH x pH; A 3 ' 1 = [..., Cov (x, Vec(xx') \y = h) , . . .] , 
p X p 2 H; A 3 ' 2 = [..., Cov (x|y = h),...}, pxpH; A 3 ' 3 = S x , p X p; A*' 2 = 
(A 2 ' 1 )'; Aq' 3 = (A 3 / 1 )'; A 2 ' 3 =(A 3 < 2 )'. 
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Table 1 : Empirical Power and Size of Marginal Dimension Tests for SIR, SAVE, SIMR a 
with a Chosen by p- Value Criterion, and r-pHd, as Well as Mean of 1 — r Distances between 
Estimated 3-Dim CDRS and True CDRS, Based on 1000 Simulations (Significance Level: 
0.05; Sample Size: 200, 400, 600; Number of Slices: 5, 10, 15) 
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0.354 
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mcan(l — 


r) 


0.124 


0.127 


0.119 


0.045 


0.060 


0.077 


0.033 


0.033 


0.039 


0.111 












n= 


400 


















SIR 




SAVE 


SIMR a 


r-pHd 


Slice 




5 


10 


15 


5 


10 


15 


5 


10 


15 




d < 




1.000 


1.000 


1.000 


1.000 


1.000 


1.000 


1.000 


1.000 


1.000 


1.000 


d < 1 




0.039 


0.050 


0.108 


0.983 


0.974 


0.888 


1.000 


1.000 


0.993 


0.293 


d < 2 




0.003 


0.001 


0.012 


0.399 


0.213 


0.091 


0.939 


0.943 


0.860 


0.026 


d < 3 




0.001 


0.000 


0.000 


0.015 


0.013 


0.010 


0.052 


0.040 


0.033 


0.002 


mean(l — 


r) 


0.127 


0.129 


0.120 


0.016 


0.025 


0.038 


0.009 


0.009 


0.011 


0.109 












n= 


600 


















SIR 




SAVE 


SIMR a 


r-pHd 


Slice 




5 


10 


15 


5 


10 


15 


5 


10 


15 




d < 




1.000 


1.000 


1.000 


1.000 


1.000 


1.000 


1.000 


1.000 


1.000 


1.000 


d < 1 




0.054 


0.062 


0.053 


1.000 


1.000 


0.998 


1.000 


1.000 


1.000 


0.328 


d<2 




0.001 


0.000 


0.002 


0.841 


0.601 


0.371 


0.996 


1.000 


0.992 


0.040 


d < 3 




0.001 


0.000 


0.000 


0.021 


0.019 


0.013 


0.048 


0.034 


0.031 


0.006 


mean(l — 


r) 


0.123 


0.123 


0.125 


0.008 


0.010 


0.016 


0.005 


0.005 


0.005 


0.108 



29 



Figure 1: Optimal a according to variability of 200 bootstrapped estimated CDRS (left 
panel, d = 3 indicates the first 3 eigenvectors considered, and so on) or p- values of weighted 
chi-squared tests (right panel, d = 3 indicates the test d < 2 versus d > 3, and so on) 
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Table 2: Ozone Data: Estimated CDRS by r-pHd, SAVE, SIMR , and SIMR . 2 (H = 10 
for SAVE and SIMR) 



First 


Second 


Third 


Fourth 


First 


Second 


Third 


Fourth 


r-pHd -.113 


0.333 


( 0.183) 


(-.194) 


SAVE 0.635 


0.126 


0.096 


(-.124) 


-.049 


0.084 


( -.018) 


(-.012) 


-.026 


-.031 


0.015 


(-.026) 


0.826 


0.939 


( -.642) 


(-.030) 


-.665 


-.621 


-.664 


(-.143) 


-.551 


-.031 


( 0.745) 


(0.981) 


-.392 


-.773 


0.741 


(0.981) 


SIMRo 0.652 


0.169 


0.092 


(0.125) 


SIMR0.2 0.685 


0.204 


0.092 


(-.125) 
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-.032 


0.015 


(0.026) 
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0.015 


(-.026) 


-.662 


-.803 
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(0.137) 
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-.708 


-.653 


(-.141) 


-.369 


-.571 


0.758 


(-.982) 


-.322 


-.676 


0.751 


(0.982) 



Note: "(•)" indicates nonsignificant direction at level 0.05. 
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